47 research outputs found

    On the closure of relational models

    Full text link
    Relational models for contingency tables are generalizations of log-linear models, allowing effects associated with arbitrary subsets of cells in a possibly incomplete table, and not necessarily containing the overall effect. In this generality, the MLEs under Poisson and multinomial sampling are not always identical. This paper deals with the theory of maximum likelihood estimation in the case when there are observed zeros in the data. A unique MLE to such data is shown to always exist in the set of pointwise limits of sequences of distributions in the original model. This set is equal to the closure of the original model with respect to the Bregman information divergence. The same variant of iterative scaling may be used to compute the MLE in the original model and in its closure

    Faithfulness and learning hypergraphs from discrete distributions

    Full text link
    The concepts of faithfulness and strong-faithfulness are important for statistical learning of graphical models. Graphs are not sufficient for describing the association structure of a discrete distribution. Hypergraphs representing hierarchical log-linear models are considered instead, and the concept of parametric (strong-) faithfulness with respect to a hypergraph is introduced. Strong-faithfulness ensures the existence of uniformly consistent parameter estimators and enables building uniformly consistent procedures for a hypergraph search. The strength of association in a discrete distribution can be quantified with various measures, leading to different concepts of strong-faithfulness. Lower and upper bounds for the proportions of distributions that do not satisfy strong-faithfulness are computed for different parameterizations and measures of association.Comment: 23 pages, 6 figure

    Relational models for contingency tables

    Full text link
    The paper considers general multiplicative models for complete and incomplete contingency tables that generalize log-linear and several other models and are entirely coordinate free. Sufficient conditions of the existence of maximum likelihood estimates under these models are given, and it is shown that the usual equivalence between multinomial and Poisson likelihoods holds if and only if an overall effect is present in the model. If such an effect is not assumed, the model becomes a curved exponential family and a related mixed parameterization is given that relies on non-homogeneous odds ratios. Several examples are presented to illustrate the properties and use of such models

    On the role of the overall effects in exponential families.

    Get PDF
    Exponential families of discrete probability distributions when the normalizing constant (or overall effect) is added or removed are compared in this paper. The latter setup, in which the exponential family is curved, is particularly relevant when the sample space is an incomplete Cartesian product or when it is very large, so that the computational burden is significant. The lack or presence of the overall effect has a fundamental impact on the properties of the exponential family. When the overall effect is added, the family becomes the smallest regular exponential family containing the curved one. The procedure is related to the homogenization of an inhomogeneous variety discussed in algebraic geometry, of which a statistical interpretation is given as an augmentation of the sample space. The changes in the kernel basis representation when the overall effect is included or removed are derived. The geometry of maximum likelihood estimates, also allowing zero observed frequencies, is described with and without the overall effect, and various algorithms are compared. The importance of the results is illustrated by an example from cell biology, showing that routinely including the overall effect leads to estimates which are not in the model intended by the researchers. © 2018 Institute of Mathematical Statistics. All rights reserved

    Entropy and Hausdorff Dimension in Random Growing Trees

    Full text link
    We investigate the limiting behavior of random tree growth in preferential attachment models. The tree stems from a root, and we add vertices to the system one-by-one at random, according to a rule which depends on the degree distribution of the already existing tree. The so-called weight function, in terms of which the rule of attachment is formulated, is such that each vertex in the tree can have at most K children. We define the concept of a certain random measure mu on the leaves of the limiting tree, which captures a global property of the tree growth in a natural way. We prove that the Hausdorff and the packing dimension of this limiting measure is equal and constant with probability one. Moreover, the local dimension of mu equals the Hausdorff dimension at mu-almost every point. We give an explicit formula for the dimension, given the rule of attachment

    Iterative Scaling in Curved Exponential Families

    Get PDF
    The paper describes a generalized iterative proportional fitting procedure that can be used for maximum likelihood estimation in a special class of the general log-linear model. The models in this class, called relational, apply to multivariate discrete sample spaces that do not necessarily have a Cartesian product structure and may not contain an overall effect. When applied to the cell probabilities, the models without the overall effect are curved exponential families and the values of the sufficient statistics are reproduced by the MLE only up to a constant of proportionality. The paper shows that Iterative Proportional Fitting, Generalized Iterative Scaling, and Improved Iterative Scaling fail to work for such models. The algorithm proposed here is based on iterated Bregman projections. As a by-product, estimates of the multiplicative parameters are also obtained. An implementation of the algorithm is available as an R-package
    corecore